How to run image classification on image url

Yes you can probably map them all to Pillow images and then cast to the Image feature:

from datasets import load_dataset
import requests
from PIL import Image

def to_pillow(examples):
    urls = examples['Photo']
    images = []
    for url in urls:
      image = Image.open(requests.get(url, stream=True).raw)
      images.append(image)
    
    examples['image'] = images
    
    return examples

dataset = load_dataset("TheNoob3131/mosquito-data")
dataset = dataset.map(to_pillow, batched=True)

from datasets import Image

dataset = dataset.cast_column('image', Image)

I’m going to cc @mariosasko here (map seems to very slow). Alternatively, you can do:

dataset.set_transform(to_pillow)

to do this on the fly.

1 Like